ˆ The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

S 88 Summer 205 Introduction to rtificial Intelligence Final ˆ You have approximately 2 hours 50 minutes. ˆ The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. ˆ Mark your answers ON THE EXM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. ll short answer sections can be successfully answered in a few sentences T MOST. First name Last name SID edx username Name of person on your left Name of person on your right For staff use only: Q. Search and Probability /0 Q2. Games /8 Q. Utilities /0 Q4. Farmland SP /8 Q5. MDP /6 Q6. ayes Nets /8 Q7. hameleon /0 Q8. Perceptron /0 Total /80

THIS PGE IS INTENTIONLLY LEFT LNK

Q. [0 pts] Search and Probability Each True/False question is worth points. Leaving a question blank is worth 0 points. nswering incorrectly is worth points. (a) onsider a graph search problem where for every action, the cost is at least ɛ, with ɛ > 0. ssume the heuristic is admissible. (i) [ pt] [true or false] Uniform-cost graph search is guaranteed to return an optimal solution. True. US expands paths in order of least total cost so that the optimal solution is found. (ii) [ pt] [true or false] The path returned by uniform-cost graph search may change if we add a positive constant to every step cost. True. onsider that there are two paths from the start state (S) to the goal (G), S G and S G. cost(s, ) =, cost(, G) =, and cost(s, G) =. So the optimal path is through. Now, if we add 2 to each of the costs, the optimal path is directly from S to G. Since uniform cost search finds the optimal path, its path will change. (iii) [ pt] [true or false] * graph search is guaranteed to return an optimal solution. False, the heuristic is admissible, but is not guaranteed to be consistent, which is required for optimal graph search. (iv) [ pt] [true or false] * graph search is guaranteed to expand no more nodes than depth-first graph search. False. Depth-first graph search could, for example, go directly to a sub-optimal solution. (v) [ pt] [true or false] If h (s) and h 2 (s) are two admissible heuristics, then their average f(s) = 2 h (s) + 2 h 2(s) must also be admissible. True. Let h (s) be the true distance from s. We know that h (s) h (s) and h 2 (s) h (s), thus h avg (s) = 2 h (s) + 2 h 2(s) 2 h (s) + 2 h (s) = h (s) (b) [ pts],,, and D are random variables with binary domains. How many entries are in the following probability tables and what is the sum of the values in each table? Write a? in the box if there is not enough information given. Table Size Sum P ( ) 4 2 P (, D + b, +c) 4 P ( + a,, D) 8 4 (c) [2 pts] Write all the possible chain rule expansions of the joint probability P (a, b, c). No conditional independence assumptions are made. P (a)p (b a)p (c a, b), P (a)p (c a)p (b a, c), P (b)p (a b)p (c a, b), P (b)p (c b)p (a b, c), P (c)p (b c)p (a b, c), P (c)p (a c)p (b a, c)

Q2. [8 pts] Games For the following game tree, each player maximizes their respective utility. Let x, y respectively denote the top and bottom values in a node. Player uses the utility function U (x, y) = x. P P2 P2 P2 8 6 2 7 5 0 5 0 (a) oth players know that Player 2 uses the utility function U 2 (x, y) = x y. (i) [2 pts] Fill in the rectangles in the figure above with pair of values returned by each max node. From top-down, left-right: (6, 2), (6, 2), (, 0), (5, ) (ii) [2 pts] You want to save computation time by using pruning in your game tree search. On the game tree above, put an X on branches that do not need to be explored or simply write None. ssume that branches are explored from left to right. None. 4

Figure repeated for convenience P P2 P2 P2 8 6 2 7 5 0 5 0 (b) Now assume Player 2 changes their utility function based on their mood. The probabilities of Player 2 s utilities and mood are described in the following table. Let M, U respectively denote the mood and utility function of Player 2. P (M = happy) a P (M = mad) b M = happy M = mad P (U 2 (x, y) = x M) c f P (U 2 (x, y) = x y M) d g P (U 2 (x, y) = x 2 + y 2 M) e h (i) [4 pts] alculate the maximum expected utility of the game for Player in terms of the values in the game tree and the tables. It may be useful to record and label your intermediate calculations. You may write your answer in terms of a max function. We first calculate the new probabilities of each utility function as follows. P (U 2 (x, y) = x) P (U 2 (x, y) = x y) P (U 2 (x, y) = x 2 + y 2 ) ac + bf ad + bg ae + bh EU(Left ranch) = (ac + bf)() + (ad + bg)(6) + (ae + bh)(6) EU(Middle ranch) = (ac + bf)() + (ad + bg)() + (ae + bh)(7) EU(Right ranch) = (ac + bf)() + (ad + bg)(5) + (ae + bh)() MEU(φ) = max((ac + bf)() + (ad + bg)(6) + (ae + bh)(6), (ac + bf)() + (ad + bg)() + (ae + bh)(7), (ac + bf)() + (ad + bg)(5) + (ae + bh)()) 5

Q. [0 pts] Utilities Davis is on his way to a final exam planning meeting. He is already running late (the meeting is starting now) and he s trying to determine whether he should wait for the bus or just walk. It takes 20 minutes to get to ory Hall by walking, and only 5 minutes to get to there by bus. The bus will either come in 0, 20, or 0 minutes, each with probability /. (a) [ pts] Davis hates being late; his utility for being late as a function of t, the number of minutes late he is, is { 0 : t 0 U D (t) = 2 t/5 : t > 0 What is the expected utility of each action? Should he wait for the bus or walk? EU(walk) = 2 20/5 = 6 EU(bus) = ( 2 (0+5)/5 2 (20+5)/5 2 (0+5)/5) = ) 8 2 28 = 68/ = 56 ( Davis should walk. (b) [ pts] Pat is running late too. However, Pat reasons that once he s late, it doesn t matter how late he is. Therefore, his utility function is { 0 : t 0 U P (t) = 0 : t > 0 Moreover, Pat prefers riding the bus because it is more comfortable, so riding the bus incurs a utility bonus of 5. If Pat is deciding whether to take the bus or walk when the meeting is just starting, what are his expected utilities for each action? Should he take the bus or walk? Pat should take the bus. EU(walk) = 0 EU(bus) = 0 + 5 = 5 (c) [2 pts] Give an example of a decreasing utility function in terms of time such that it will favor decisions that always minimize expected time to get to the meeting. U(t) = t. ny decreasing linear function of t is correct. (d) [2 pts] Give an example of a decreasing utility function in terms of time such that it will be risk-seeking; that is, a lottery with expected time of arrival t will be preferred to a guarantee of arrival time t. U(t) = t. ny decreasing function with a positive second derivative (concave up) is correct. 6

Q4. [8 pts] Farmland SP The animals in Farmland aren t getting along and the farmers have to assign them to different pens. To avoid fighting, animals of the same type cannot be in connected pens. Fortunately, the Farmland pens are connected in a tree structure. (a) [2 pts] onsider the following constraint diagram that shows six pens with lines indicating connected pens. The remaining domains for each pen are listed below each node. ull Goat 2 4 5 6 Goat ull Duck Goat ull Duck ull Duck Goat Duck Goat fter assigning a bull to pen 5, enforce arc consistency on this SP considering only the directed arcs shown in the figure. What are the remaining values for each pen? Pen Values ull 2 Goat Duck, Goat 4 ull, Duck 5 ull 6 Duck, Goat (b) [2 pts] What is the computational complexity of solving general tree structured SPs with n nodes and d values in the domain? Give an answer of the form O( ). O(nd 2 ) (c) This True/False question is worth points. Leaving a question blank is worth 0 points. nswering incorrectly is worth points. (i) [ pt] [true or false] If root to leaf arcs are consistent on a general tree structured SP, assigning values to nodes from root to leaves will not back-track if a solution exists. True. ecause the arcs are consistent, there is a valid value not matter which parent value was assigned. (d) [ pts] Given animal types, what is the most number of pens a tree structure could have, such that the computational complexity to solve the tree SP is no greater than the computational complexity to solve a fully connected SP with 0 pens? 8. fully connected SP is O(d n ), while a tree structure is O(nd 2 ). The intent of this question was to show that you could have 8 nodes in a tree structure and that would be roughly same amount of computation as a fully connected problem with 0 nodes ( 0 = 8 2 ). Unfortunately, this question is poorly worded, because computational complexity doesn t quite work with specific values like this. 7

Q5. [6 pts] MDP Pacman is using MDPs to maximize his expected utility. In each environment: ˆ Pacman has the standard actions {North, East, South, West} unless blocked by an outer wall ˆ There is a reward of point when eating the dot (for example, in the grid below, R(, South, F ) = ) ˆ The game ends when the dot is eaten (a) onsider a the following grid where there is a single food pellet in the bottom right corner (F ). The discount factor is 0.5. There is no living reward. The states are simply the grid locations. D E F (i) [2 pts] What is the optimal policy for each state? State π(state) East or South East or South E D D E South East East (ii) [2 pts] What is the optimal value for the state of being in the upper left corner ()? Reminder: the discount factor is 0.5. V () = 0.25 k V() V() V() V(D) V(E) V(F) 0 0 0 0 0 0 0 0 0 0 0 2 0 0.5 0.5 0 0.25 0.5 0.5 0 4 0.25 0.5 0.5 0 (iii) [2 pts] Using value iteration with the value of all states equal to zero at k=0, for which iteration k will V k () = V ()? k = (see above) 8

(b) onsider a new Pacman level D thate begins F with cherries in locations D and F. Landing on a grid position with cherries is worth 5 points and then the cherries at that position disappear. There is still one dot, worth point. The game still only ends when the dot is eaten. D D E F E (i) [2 pts] With no discount (γ = ) and a living reward of -, what is the optimal policy for the states in this level s state space? F State π(state) South East D, F herry =true East D, F herry =false North E, F herry =true East E, F herry =false West F West Larger state spaces with equivalent states and actions are possible too. For example with the state representation of (grid, D-cherry, F-cherry), there could be up to 24 different states, where all four with are the same, etc. (ii) [2 pts] With no discount (γ = ), what is the range of living reward values such that Pacman eats exactly one cherry when starting at position? Valid range for the living reward is (-2.5,-.25). Let x equal the living reward. The reward for eating zero cherries {,} is x + (one step plus food). The reward for eating exactly one cherry {,,D,} is x + 6 (three steps plus cherry plus food). The reward for eating two cherries {,,D,E,F,E,D,} is 7x + (seven steps plus two cherries plus food). x must be greater than -2.5 to make eating at least one cherry worth it (x + 6 > x + ). x must be less than -.25 to eat less than one cherry (x + 6 > 7x + ). (c) Quick reinforcement learning questions [PLESE WRITE LERLY]: (i) [ pt] What is the difference between value-iteration and TD-learning? Value iteration has explicity models for transitions and rewards, while TD-learning relies on active samples. (ii) [ pt] What is the difference between TD-learning and Q-learning? TD-learning stores and updates V(s) while Q-learning stores and updates Q(s,a). lso, Q-learning is able to learn quality policies despite random or suboptimal actions, while TD-learning values are affected by the actions taken. (iii) [ pt] What is the purpose of using a learning rate (α) during Q-learning? The learning rate allows us to average information from previous iterations with the current sample. It allows us to step towards a solution at an incremental rate. This allows us to incorporate random samples while moving away from poor initial estimates. (iv) [ pt] In value iteration, we store the value of each state. What do we store during approximate Q-learning? We update and store the weights associated with the features. (v) [2 pts] Give one advantage and one disadvantage of using approximate Q-learning rather than standard Q-learning. Pros: Feature representation scales to very large or infinite spaces; learning process generalizes from seen states to unseen states. ons: True Q may not be representable in the chosen form; learning may not converge; need to design feature functions. 9

Q6. [8 pts] ayes Nets (a) For the following graphs, explicitly state the minimum size set of edges that must be removed such that the corresponding independence relations are guaranteed to be true. Marked the removed edges with an X on the graphs. (i) [2 pts] D F E D (ii) [2 pts] D E F D, (EF OR ) (b) You re performing variable elimination over a ayes Net with variables,,, D, E. So far, you ve finished joining over (but not summing out), when you realize you ve lost the original ayes Net! Your current factors are f(), f(), f(, D), f(,,, D, E). Note: these are factors, NOT joint distributions. You don t know which variables are conditioned or unconditioned. (i) [2 pts] What s the smallest number of edges that could have been in the original ayes Net? Draw out one such ayes Net below. Number of edges = 5 The original ayes net must have had 5 factors, for each node. f() and f() must have corresponded to nodes and, and indicate that neither nor have any parents. f(, D), then, must correspond to node D, and indicates that D has only as a parent. Since there is only one factor left, f(,,, D, E), for the nodes and E, those two nodes must have been joined while you were joining. This implies two things: ) E must have had as a parent, and 2) every other node must have been a parent of either or E. The below figure is one possible solution that uses the fewest possible edges to satisfy the above. D E (ii) [2 pts] What s the largest number of edges that could have been in the original ayes Net? Draw out one such ayes Net below. 0

Number of edges = 8 The constraints are the same as outlined in part i). To maximize the number of edges, we make each of,, and D a parent of both and E, as opposed to a parent of one of them. The below figure is the only possible solution. D E

Q7. [0 pts] hameleon team of scientists from erkeley discover a rare species of chameleons. Each one can change its color to be blue or gold, once a day. The probability of colors on a certain day are determined solely by its color on the previous day. The team spends 5 days observing 0 chameleons changing color from day to day. chameleons color transitions are below. The recorded counts for the # of t+ t t = 0 t = t = 2 t = # of t+ = gold t = gold 0 0 8 2 # of t+ = blue t = gold 7 0 0 8 # of t+ = gold t = blue 0 8 2 0 # of t+ = blue t = blue 2 0 0 (a) [ pts] They suspect that this phenomenon obeys the stationarity assumption that is, the transition probabilites are actually the same between all the days. Estimate the transition probabilites P ( t+ t ) from the above simulation. P ( t+ t ) P ( t+ = gold t = gold) 0/25 = 2/5 P ( t+ = blue t = gold) 5/25 = /5 P ( t+ = gold t = blue) 0/5 = 2/ P ( t+ = blue t = blue) 5/25 = / To solve this problem, find the total number of chameleons that were gold (8 + 2 + 7 + 8 = 25) and then split it into those that turned gold (8 + 2 = 0) and those that turned blue (7 + 8 = 5). Normalizing yields 0/25 and 5/25 for the first two probabilies. Repeat for the chameleons that were blue. One common mistake was incorrectly normalizing of the probability table (e.g. dividing by 40 instead of 25). nother was to use only the transitions on t= and t= to get 0.2, 0.8, 0.8, 0.2, which fails to account for the other observed transitions on t=0 and t=2. (b) [2 pts] Further scientific tests determine that these chameleons are, in fact, immortal. s a result, they want to determine the distribution of a chameleon s colors over an infinite amount of time. Given the estimated transition probabilities, what is the steady state distribution for P ( )? P ( ) P ( = gold) 0/9 P ( = blue) 9/9 Let g = P ( = gold) and b = P ( = blue). g = 2 5 + 2 b = 5 g = 2 b = g = 0 9 b ombining the two: g + b = 2 5 b + b = 9 9 b = = b = 9 9 = g = 0 9 2

The chameleons, realizing that these tests are being performed, decide to hide. The scientists can no longer observe them directly, but they can observe the bugs that one particular chameleon likes to eat. They know that the chameleon s color influences the probability that it will eat some fraction of a nest. The scientists will observe the size of the nests twice per day: once in the morning, before the chameleon eats, and once in the evening, after the chameleon eats. Every day, the chameleon moves on to a new nest. (c) [ pt] Draw a DN using the variables t, t+, M t, M t+, E t, and E t+. refers to the color of the chameleon, M is the size of a nest in the morning, and E is the size of that nest in the evening. t t+ M t E t M t+ E t+ When the chameleon is blue, it eats half of the bugs in the chosen nest with probability /2, one-third of the bugs with probability /4, and two-thirds of the bugs with probability /4. When the chameleon is gold, it eats one-third, half, or two-thirds of the bugs, each with probability /. (d) [4 pts] You would like to use particle filtering to guess the chameleon s color based on the observations of M and E. You observe the following population sizes: M = 24, E = 2, M 2 = 6, and E 2 = 24. Fill in the following tables with the weights you would assign to particles in each state at each time step. State at t = Weight lue 2 Gold State at t = 2 Weight lue 4 Gold The weights in HMM particle filtering are exactly equal to P (emission parents of emission). In this problem, the change is that the emission is dependent on an addiitonal parameter, the number of morning bugs. For the blue state at t=, the weight is equal to P (E = 2 M = 24, = lue) = /2. One extra step that was commonly made was to normalize the weights afterwards to or some other number; this is extraneous as the resample step of particle filtering only depends on the relative (not absolute) weights of the particles.

Q8. [0 pts] Perceptron We would like to use a perceptron to train a classifier for datasets with 2 features per point and labels + or -. onsider the following labeled training data: Features Label (x, x 2 ) y (-,2) (,-) - (,2) - (,) (a) [2 pts] Our two perceptron weights have been initialized to w = 2 and w 2 = 2. fter processing the first point with the perceptron algorithm, what will be the updated values for these weights? For the first point, y = g(w x + w 2 x 2 ) = g(2 + 2 2) = g( 5) =, which is incorrectly classified. To updathe weights, we add the first data point: w = 2 + ( ) = and w 2 = 2 + 2 = 0. (b) [2 pts] fter how many steps will the perceptron algorithm converge? Write never if it will never converge. Note: one steps means processing one point. Points are processed in order and then repeated, until convergence. The data is not seperable, so it will never converge. (c) Instead of the standard perceptron algorithm, we decide to treat the perceptron as a single node neural network and update the weights using gradient descent on the loss function. The loss function for one data point is Loss(y, y ) = (y y ) 2, where y is the training label for a given point and y is the output of our single node network for that point. (i) [ pts] Given a general activation function g(z) and its derivative g (z), what is the derivative of the loss function with respect to w in terms of g, g, y, x, x 2, w, and w 2? Loss w = 2(g(w x + w 2 x 2 ) y )g (w x + w 2 x 2 )x (ii) [2 pts] For this question, the specific activation function that we will use is: g(z) = if z 0 and = if z < 0 Given the following gradient descent equation to update the weights given a single data point. With initial weights of w = 2 and w 2 = 2, what are the updated weights after processing the first point? Gradient descent update equation: w i = w i α Loss w ecause the gradient of g is zero, the weights will stay w = 2 and w 2 = 2. (iii) [ pt] What is the most critical problem with this gradient descent training process with that activation function? The gradient of that activation function is zero, so the weights will not update. 4

THIS PGE IS INTENTIONLLY LEFT LNK